Wikipedia Mining for an Association Web Thesaurus Construction

نویسندگان

  • Kotaro Nakayama
  • Takahiro Hara
  • Shojiro Nishio
چکیده

Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Structured Knowledge for Semantic Web by Mining Wikipedia

Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia’s impressive characteristics are not limited to the scale, but also include the dense link structure...

متن کامل

A Search Engine for Browsing the Wikipedia Thesaurus

Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedi...

متن کامل

Association Thesaurus Construction for Interactive Query Expansion Based on Association Rule Mining

This paper presents an interactive query expansion method with association thesaurus, which is mined from the ‘selected web pages’ of users in the query logs. The ‘selected web pages’ of users are transferred into ‘sets of query terms’ and then used for term correlation mining. Accordingly, various association thesauruses concerning different query terms are constructed from these term correlat...

متن کامل

Mining Enterprise Websites for Association Thesaurus Construction

Enterprise websites are useful resources for obtaining information about products and services of companies. Typically on these websites, a product is associated to a Web page, and related products are connected by hyperlinks. As a result, the connectivity graph of an enterprise website exposes the company’s products (nodes) and how they are associated (links). This paper presents a novel appro...

متن کامل

Automatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia

Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007